theoretical justification
Imitation Learning from Imperfection: Theoretical Justifications and Algorithms
Imitation learning (IL) algorithms excel in acquiring high-quality policies from expert data for sequential decision-making tasks. But, their effectiveness is hampered when faced with limited expert data. To tackle this challenge, a novel framework called (offline) IL with supplementary data has been proposed, which enhances learning by incorporating an additional yet imperfect dataset obtained inexpensively from sub-optimal policies. Nonetheless, learning becomes challenging due to the potential inclusion of out-of-expert-distribution samples. In this work, we propose a mathematical formalization of this framework, uncovering its limitations.
Insights from the ICLR Peer Review and Rebuttal Process
Kargaran, Amir Hossein, Nikeghbal, Nafiseh, Yang, Jing, Ousidhoum, Nedjma
Peer review is a cornerstone of scientific publishing, including at premier machine learning conferences such as ICLR. As submission volumes increase, understanding the nature and dynamics of the review process is crucial for improving its efficiency, effectiveness, and the quality of published papers. We present a large-scale analysis of the ICLR 2024 and 2025 peer review processes, focusing on before- and after-rebuttal scores and reviewer-author interactions. We examine review scores, author-reviewer engagement, temporal patterns in review submissions, and co-reviewer influence effects. Combining quantitative analyses with LLM-based categorization of review texts and rebuttal discussions, we identify common strengths and weaknesses for each rating group, as well as trends in rebuttal strategies that are most strongly associated with score changes. Our findings show that initial scores and the ratings of co-reviewers are the strongest predictors of score changes during the rebuttal, pointing to a degree of reviewer influence. Rebuttals play a valuable role in improving outcomes for borderline papers, where thoughtful author responses can meaningfully shift reviewer perspectives. More broadly, our study offers evidence-based insights to improve the peer review process, guiding authors on effective rebuttal strategies and helping the community design fairer and more efficient review processes. Our code and score changes data are available at https://github.com/papercopilot/iclr-insights.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- North America > United States > California (0.14)
- Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
- (12 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
3ce257b311e5acf849992f5a675188e8-AuthorFeedback.pdf
We thank the reviewers for the positive comments and useful feedback. We provide responses to the main comments. Connections to Cotter et al: There are two main differences between our paper and Cotter et al. (2019a;b): Code: We will make Tensorflow code available. We will include a discussion on surrogates in Section 2. Non-Differentiable Constraints with Applications to Fairness, Recall, Churn, and Other Goals.
Common Q1: Theoretical justification on why A WP works
Common Q1: Theoretical justification on why A WP works. Based on previous work on P AC-Bayes bound (Neyshabur et al., NeurIPS 2017), in adversarial training, let R#1 Q1: The weights are constantly perturbed in the worst case, the model may find it difficult to learn. R#1 Q2: How do the baseline methods that do implicit weight perturbations differ from A WP? We did not claim that "baseline methods do the implicit weight perturbations". R#1 Q3: What is the difference of weights learned by A T -A WP and vanilla A T? R#2 Q1: Only CIF AR-10 and single neural networks are tested. We have tested several network architectures and datasets in the main body and appendix, e.g., PreAct ResNet-18, R#2 Q2: In Figure 1, the α value in the loss landscape is embed into training or post-training?
our main claim that spectral modification can circumvent a fundamental weakness of spectral clustering
We thank the reviewers for their nice and helpful comments. Our modification algorithm is intended as a proof of concept of what we view as a general framework. There have been several recent works on graph embedding methods. Our experimental examples are well known and appreciated in spectral graph theory. We can report that the unsupervised versions of recent graph embedding methods (e.g.
Justifications for Democratizing AI Alignment and Their Prospects
Steingrüber, André, Baum, Kevin
The AI alignment problem comprises both technical and normative dimensions. While technical solutions focus on implementing normative constraints in AI systems, the normative problem concerns determining what these constraints should be. This paper examines justifications for democratic approaches to the normative problem -- where affected stakeholders determine AI alignment -- as opposed to epistocratic approaches that defer to normative experts. We analyze both instrumental justifications (democratic approaches produce better outcomes) and non-instrumental justifications (democratic approaches prevent illegitimate authority or coercion). We argue that normative and metanormative uncertainty create a justificatory gap that democratic approaches aim to fill through political rather than theoretical justification. However, we identify significant challenges for democratic approaches, particularly regarding the prevention of illegitimate coercion through AI alignment. Our analysis suggests that neither purely epistocratic nor purely democratic approaches may be sufficient on their own, pointing toward hybrid frameworks that combine expert judgment with participatory input alongside institutional safeguards against AI monopolization.
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- Europe > Germany > Saarland > Saarbrücken (0.04)
Review for NeurIPS paper: Improving model calibration with accuracy versus uncertainty optimization
Additional Feedback: Post Rebuttal: The authors have satisfactorily addressed all my concerns. Specifically, my major concern on the absence of theoretical justification behind this approach will be addressed by authors incorporating R4's suggestion on discussing how the approach serves as loss-calibrated inference method. This would certainly make this paper strong. In this paper, the authors propose a modified loss function for improving the performance of uncertainty-aware DNNs. They show applicability of their loss with mean-field stochastic variational inference (SVI) based BNN.
A Theoretical Justification for Asymmetric Actor-Critic Algorithms
Lambrechts, Gaspard, Ernst, Damien, Mahajan, Aditya
In reinforcement learning for partially observable environments, many successful algorithms were developed within the asymmetric learning paradigm. This paradigm leverages additional state information available at training time for faster learning. Although the proposed learning objectives are usually theoretically sound, these methods still lack a theoretical justification for their potential benefits. We propose such a justification for asymmetric actor-critic algorithms with linear function approximators by adapting a finite-time convergence analysis to this setting. The resulting finite-time bound reveals that the asymmetric critic eliminates an error term arising from aliasing in the agent state.
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.47)